Improving the Robustness of Relevance-Based Language Models
نویسنده
چکیده
We propose a new robust relevance model that can be applied to both pseudo feedback and true relevance feedback in the language-modeling framework for document retrieval. There are three main differences between our new relevance model and the Lavrenko-Croft relevance model. First, a query is treated as a short, special document and included in approximating a relevance model, in addition to a number of top ranked documents returned from the first round retrieval for pseudo feedback, or a number of relevant documents for true relevance feedback. Second, instead of using a uniform prior as in the original relevance model, documents are assigned with different priors according to their lengths (in terms) and ranks in the first round retrieval. Third, the probability of a term in the relevance model is further adjusted by its probability in the background language model. We have applied the proposed new model to both pseudo feedback and true relevance feedback. In both cases, we have compared the performance of our model to that of the two baselines: the original relevance model and a linear combination model. Experiments were carried out with TREC title queries 101 to 200 on AP collections and queries 301 to 400 on a heterogeneous collection consisting of the data from TREC disk 4 and 5. The results show that the proposed new model outperforms both of the two baselines – the original relevance model and a linear combination model -in terms of mean average precision. Furthermore, for pseudo feedback, it is less sensitive to the number of documents than the two baseline models, and for true relevance feedback, it achieves better performance than the two baseline models using fewer relevant documents.
منابع مشابه
Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles
When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...
متن کاملA Latent Dirichlet Framework for Relevance Modeling
Relevance-based language models operate by estimating the probabilities of observing words in documents relevant (or pseudo relevant) to a topic. However, these models assume that if a document is relevant to a topic, then all tokens in the document are relevant to that topic. This could limit model robustness and effectiveness. In this study, we propose a Latent Dirichlet relevance model, whic...
متن کاملEnhancing Relevance Models with Adaptive Passage Retrieval
Passage retrieval and pseudo relevance feedback/query expansion have been reported as two effective means for improving document retrieval in literature. Relevance models, while improving retrieval in most cases, hurts performance on some heterogeneous collections. Previous research has shown that combining passage-level evidence with pseudo relevance feedback brings added benefits. In this pap...
متن کاملScore distributions for Pseudo Relevance Feedback
Relevance-Based Language Models, commonly known as Relevance Models, are successful approaches to explicitly introduce the concept of relevance in the statistical Language Modelling framework of Information Retrieval. These models achieve state-of-the-art retrieval performance in the Pseudo Relevance Feedback task. It is known that one of the factors that more affect to the Pseudo Relevance Fee...
متن کاملThe value relevance of accounting disclosures among listed Nigerian firms: IFRS adoption
This study determined the value relevance of assets and liabilities after the adoption of IFRS among listed Nigerian firms. Ohlson Model (1995) model of stock price regressions tested the relationship between assets and liabilities with the stock price, which has been widely adopted by accounting researchers. A sample of 126 firms listed in Nigeria stock market is used for the study. Data is co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005